Machine Learning Methods For Chinese Web Page Categorization

نویسندگان

  • Ji He
  • Ah-Hwee Tan
  • Chew Lim Tan
چکیده

This paper reports our evaluation of k Nearest Neighbor (kNN), Support Vector Machines (SVM), and Adaptive Resonance Associative Map (ARAM) on Chinese web page classi cation. Benchmark experiments based on a Chinese web corpus showed that their predictive performance were roughly comparable although ARAM and kNN slightly outperformed SVM in small categories. In addition, inserting rules into ARAM helped to improve performance, especially for small wellde ned categories.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparative Study on Chinese Text Categorization Methods

This paper reports our comparative evaluation of three machine learning methods on Chinese text categorization. Whereas a wide range of methods have been applied to English text categorization, relatively few studies have been done on Chinese text categorization. Based on a re-constructed People’s Daily corpus, a series of controlled experiments evaluate three machine learning methods, namely k...

متن کامل

Supervised learning Methods for Bangla Web Document Categorization

This paper explores the use of machine learning approaches, or more specifically, four supervised learning Methods, namely Decision Tree(C 4.5), K-Nearest Neighbour (KNN), Naïve Bays (NB), and Support Vector Machine (SVM) for categorization of Bangla web documents. This is a task of automatically sorting a set of documents into categories from a predefined set. Whereas a wide range of methods h...

متن کامل

A Survey on Information Retrieval, Text Categorization, and Web Crawling

This paper is a survey discussing Information Retrieval concepts, methods, and applications. It goes deep into the document and query modelling involved in IR systems, in addition to pre-processing operations such as removing stop words and searching by synonym techniques. The paper also tackles text categorization along with its application in neural networks and machine learning. Finally, the...

متن کامل

A New Integrated Machine Learning Approach for Web Page Categorization

––Clustering is an unsupervised task whereas classification is supervised in nature. In the context of machine learning, classification of instances of a dataset is carried out by a classifier after the classifier is made to learn the model from a training dataset. The training data consists of instances which are labeled by a human expert. The labels are the classes into which the instances of...

متن کامل

Refined and Incremental Centroid-based approach for Genre Categorization of Web pages

In this paper, I propose a refined and incremental centroid-based approach for genre categorization of web pages. My approach is based on the construction of genre centroids using a set of training web pages. These centroids will be used to classify new web pages. The originality of my approach is the implementation of two new aspects, which are refining and incrementing. My approach is based o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000